Visualising phylogenies in R

Natalie Cooper (natalie.cooper@nhm.ac.uk)

1. Introduction

This is a quick practical to introduce you to some of the fun ways you can visualise phylogenetic trees in R. The code and examples are based on Liam Revell’s book chapter here. I am assuming that people have little or no experience with R so more experienced users may want to skim quickly through the first few sections.

Throughout, R code will be in shaded boxes:

library(ape)

R output will be preceded by ## and important comments will be in quote blocks:

Note that many things in R can be done in multiple ways. You should choose the methods you feel most comfortable with, and do not panic if someone is doing the same analyses as you in a different way!


2. Preparations

Downloading R (and RStudio)

If you don’t already have it you’ll need to download R from CRAN. I also recommend using RStudio but you don’t need this.

Downloading the data and phylogeny

First you need to download all the files for this practical into a folder somewhere on your computer (I usually put mine on the Desktop). You’ll need to know what the path of the folder is. For example on my Windows machine, the path is:

C:/Users/Natalie/Desktop/RAnalyses

The path is really easy to find in a Windows machine, just click on the address bar of the folder and the whole path will appear.

On my Mac the path is:

~/Desktop/RAnalyses

It’s a bit trickier to find the path on a Mac, so ask if you need help. Note that the tilde ~ is a shorthand for /Users/Natalie.

Using a script

Next open R (or RStudio) and then open a text editor. R has an inbuilt editor that works pretty well, but NotePad and TextEdit are fine too. However, in the future I highly recommend using something that will highlight code for you. My personal favorite is Sublime Text 2, because you can also use it for any other kind of text editing like LaTeX, html etc. RStudio’s editor is also very nice.

You should type (or copy and paste) your code into the text editor, edit it until you think it’ll work, and then paste it into R’s console window. Saving the script file lets you keep a record of the code you used, which can be a great timesaver if you want to use it again, especially as you know this code will work!

You can cut and paste code from this handout into your script. You don’t need to retype everything!

If you want to add comments to the file (i.e., notes to remind yourself what the code is doing), put a hash/pound sign (#) in front of the comment.

# Comments are ignored by R but remind you what the code is doing. 
# You need a # at the start of each line of a comment.
# Always make plenty of notes to help you remember what you did and why

Setting the working directory

We can the set the working directory to the folder containing your data using setwd:

setwd("~/Desktop/RAnalyses")

Setting the working directory tells R which folder to look for data in (and which folder you’d like it to write results to). It saves a bit of typing when reading files into R. Now I can read in a file called mydata.csv as follows:

mydata <- read.csv("mydata.csv")

rather than having to specify the folder too:

mydata <- read.csv("~/Desktop/RAnalyses/mydata.csv")

If you’d rather specify the path and not set the working directory that’s fine.

Installing and loading extra packages in R

To plot phylogenies (or use any specialized analysis) in R, you need to download one or more additional packages from the basic R installation. For this practical you will need to install the following packages:

  • ape
  • phytools

To install the package ape:

install.packages("ape")

Pick the closest mirror to you if asked. Now install phytools.

You’ve installed the packages but they don’t automatically get loaded into your R session. Instead you need to tell R to load them every time you start a new R session and want to use functions from these packages. To load the package ape into your current R session:

library(ape)

You can think of install.packages like installing an app from the App Store on your smart phone - you only do this once - and library as being like pushing the app button on your phone - you do this every time you want to use the app.

Don’t forget to load phytools too!

library(phytools)

3. Loading your phylogeny and data into R

Reading in a phylogeny from a file

To load a tree you need either the function read.tree or read.nexus. read.tree can deal with a number of different types of data (including DNA) whereas read.nexus reads NEXUS files. elopomorph.tre is not in NEXUS format so we use read.tree.

fishtree <- read.tree("elopomorph.tre")

Reading in a phylogeny that is already built into R

The bird and anole phylogenies are already built into R so we don’t need to read them in using read.tree. Instead we just use:

data(bird.orders)
data(anoletree)

Reading and viewing your data in R

Later we will use some Greater Antillean Anolis lizard data to add data to a phylogeny. Before we can add data to our tree, we need to load the data we are going to use. R can read files in lots of formats, including comma-delimited and tab-delimited files. Excel (and many other applications) can output files in this format (it’s an option in the Save As dialog box under the File menu). To save time I have given you a comma-delimited text file called anole.data.csv which we are going to use. Load these data as follows. I am assuming you have set your working directory, if not don’t forget the path.

anoledata <- read.csv("anole.data.csv", header = TRUE)

You can use read.delim for tab delimited files or read.csv for comma delimited files. header = TRUE, indicates that the first line of the data contains column headings.

This is a good point to note that unless you R you want to do something, it won’t do it automatically. So here if you successfully entered the data, R won’t give you any indication that it worked. Instead you need to specifically ask R to look at the data.

We can look at the data by typing:

str(anoledata)
## 'data.frame':    100 obs. of  23 variables:
##  $ species               : Factor w/ 100 levels "ahli","alayoni",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ AVG.SVL               : num  4.04 3.82 3.53 4.04 4.38 ...
##  $ AVG.hl                : num  2.88 2.7 2.38 2.9 3.36 ...
##  $ AVG.hw                : num  2.36 1.99 1.56 2.37 2.69 ...
##  $ AVG.hh                : num  2.13 1.75 1.39 2.05 2.32 ...
##  $ AVG.ljl               : num  2.85 2.71 2.32 2.9 3.38 ...
##  $ AVG.outlever          : num  2.75 2.62 2.26 2.83 3.29 ...
##  $ AVG.jugal.to.symphysis: num  2.54 2.37 2.08 2.6 3.07 ...
##  $ AVG.femur             : num  2.74 2.07 2.17 2.48 2.8 ...
##  $ AVG.tibia             : num  2.69 2.02 2.09 2.34 2.69 ...
##  $ AVG.met               : num  2.25 1.54 1.55 1.87 2.18 ...
##  $ AVG.ltoe.IV           : num  2.55 1.88 1.73 2.26 2.53 ...
##  $ AVG.toe.IV.lam.width  : num  0.1795 0.0488 -0.5361 0.4904 0.8441 ...
##  $ AVG.humerus           : num  2.46 1.95 1.63 2.3 2.62 ...
##  $ AVG.radius            : num  2.27 1.69 1.4 2.09 2.34 ...
##  $ AVG.lfing.IV          : num  1.94 1.4 1.04 1.7 1.98 ...
##  $ AVG.fing.IV.lam.width : num  0.0754 -0.0739 -0.755 0.3155 0.6584 ...
##  $ AVG.pelv.ht           : num  1.99 1.51 1.19 1.87 2.1 ...
##  $ AVG.pelv.wd           : num  1.71 1.419 0.946 1.752 2.014 ...
##  $ Foot.Lam.num          : num  3.28 3.43 3.2 3.58 3.72 ...
##  $ Hand.Lam.num          : num  2.87 3.08 2.73 3.16 3.24 ...
##  $ Avg.lnSVL2            : num  3.94 3.74 3.48 3.93 4.36 ...
##  $ Avg.ln.t1             : num  4.41 4 4.37 4.44 5.04 ...

Always look at your data before beginning any analysis to check it read in correctly.

str shows the structure of the data frame (this can be a really useful command when you have a big data file). It also tells you what kind of variables R thinks you have (characters, integers, numeric, factors etc.). Some R functions need the data to be certain kinds of variables so it’s useful to check this.

head(anoledata)
##    species  AVG.SVL   AVG.hl   AVG.hw   AVG.hh  AVG.ljl AVG.outlever
## 1     ahli 4.039125 2.882657 2.356126 2.131599 2.854745     2.754934
## 2  alayoni 3.815705 2.702116 1.990610 1.751588 2.708966     2.617852
## 3  alfaroi 3.526655 2.378156 1.556037 1.390037 2.323857     2.263844
## 4 aliniger 4.036557 2.898836 2.366592 2.046660 2.903946     2.828555
## 5 allisoni 4.375390 3.358957 2.690339 2.317309 3.381306     3.288402
## 6  allogus 4.040138 2.861027 2.351275 2.142107 2.848331     2.750404
##   AVG.jugal.to.symphysis AVG.femur AVG.tibia  AVG.met AVG.ltoe.IV
## 1               2.538052  2.741378  2.686032 2.254620    2.553344
## 2               2.371995  2.065121  2.016402 1.535253    1.875258
## 3               2.077565  2.172476  2.094946 1.547563    1.732540
## 4               2.596821  2.475445  2.344015 1.867176    2.256541
## 5               3.071149  2.795145  2.687734 2.178155    2.533697
## 6               2.539320  2.739175  2.684476 2.217314    2.494900
##   AVG.toe.IV.lam.width AVG.humerus AVG.radius AVG.lfing.IV
## 1           0.17953991    2.460728   2.268511     1.943288
## 2           0.04879016    1.947873   1.687093     1.403029
## 3          -0.53614343    1.628260   1.401183     1.040277
## 4           0.49041881    2.298878   2.091864     1.702199
## 5           0.84407840    2.618551   2.341565     1.983412
## 6           0.19625907    2.459994   2.267176     1.919010
##   AVG.fing.IV.lam.width AVG.pelv.ht AVG.pelv.wd Foot.Lam.num Hand.Lam.num
## 1            0.07541664    1.987646   1.7095849     3.279600     2.866197
## 2           -0.07391568    1.507128   1.4194873     3.432372     3.075269
## 3           -0.75502258    1.189367   0.9458495     3.198016     2.733866
## 4            0.31554040    1.867949   1.7521520     3.582875     3.156774
## 5            0.65838319    2.097302   2.0142361     3.721185     3.239211
## 6            0.05329130    2.008426   1.6875679     3.335990     2.808270
##   Avg.lnSVL2 Avg.ln.t1
## 1   3.939015  4.406652
## 2   3.743863  4.002788
## 3   3.478776  4.369448
## 4   3.933181  4.441203
## 5   4.355582  5.039851
## 6   4.028863  4.510920

This gives you the first few rows of data along with the column headings.

names(anoledata)
##  [1] "species"                "AVG.SVL"               
##  [3] "AVG.hl"                 "AVG.hw"                
##  [5] "AVG.hh"                 "AVG.ljl"               
##  [7] "AVG.outlever"           "AVG.jugal.to.symphysis"
##  [9] "AVG.femur"              "AVG.tibia"             
## [11] "AVG.met"                "AVG.ltoe.IV"           
## [13] "AVG.toe.IV.lam.width"   "AVG.humerus"           
## [15] "AVG.radius"             "AVG.lfing.IV"          
## [17] "AVG.fing.IV.lam.width"  "AVG.pelv.ht"           
## [19] "AVG.pelv.wd"            "Foot.Lam.num"          
## [21] "Hand.Lam.num"           "Avg.lnSVL2"            
## [23] "Avg.ln.t1"

This gives you the names of the columns.

anoledata

This will print out all of the data!

4. Basic tree viewing in R

Now let’s visualise some phylogenies! We’ll use the Elopomorpha (eels and similar fishes) tree to start as it is simple.

fishtree <- read.tree("elopomorph.tre")

Let’s examine the tree by typing:

fishtree
## 
## Phylogenetic tree with 62 tips and 61 internal nodes.
## 
## Tip labels:
##  Moringua_edwardsi, Kaupichthys_nuchalis, Gorgasia_taiwanensis, Heteroconger_hassi, Venefica_proboscidea, Anguilla_rostrata, ...
## 
## Rooted; includes branch lengths.
str(fishtree)
## List of 4
##  $ edge       : int [1:122, 1:2] 63 64 64 65 66 67 68 68 69 70 ...
##  $ Nnode      : int 61
##  $ tip.label  : chr [1:62] "Moringua_edwardsi" "Kaupichthys_nuchalis" "Gorgasia_taiwanensis" "Heteroconger_hassi" ...
##  $ edge.length: num [1:122] 0.0105 0.0672 0.00537 0.00789 0.00157 ...
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"

fishtree is a fully resolved tree with branch lengths. There are 62 species and 61 internal nodes. We can plot the tree by using the plot.phylo function of ape. Note that we can just use the function plot to do this as R knows if we ask it to plot a phylogeny to use plot.phylo instead!

plot(fishtree, cex = 0.5)

cex = 0.5 reduces the size of the tip labels so we can read them. We can also zoom into different sections of the tree that you’re interested in:

zoom(fishtree, grep("Gymnothorax", fishtree$tip.label), subtree = FALSE, cex = 0.8)

This just gives you the tree for Micropterus species but you can also see how the species fit into the rest of the tree using:

zoom(fishtree, grep("Gymnothorax", fishtree$tip.label), subtree = TRUE, cex = 0.8)

Note that zoom automatically sets the plotting window to display two plots at once. To reset this to one plot only use:

par(mfrow = c(1, 1))

To get further options for the plotting of phylogenies:

?plot.phylo

Note that although you can use plot to plot the phylogeny, you need to specify plot.phylo to find out the options for plotting trees. You can change the style of the tree (type), the color of the branches and tips (edge.color, tip.color), and the size of the tip labels (cex). Here’s an fun/hideous example!

plot(fishtree, type = "fan", edge.color = "deeppink", tip.color = "springgreen", 
     cex = 0.5)

Or try

plot(fishtree, type = "c", edge.color = "darkviolet", tip.color = "hotpink", 
     cex = 0.5)

5. Adding trait data to trees in R

A. Ancestral state reconstructions on discrete data

We will use the bird data Remember we already loaded the phylogeny and data as follows:

data(bird.orders)

First we will invent some data for each bird order that we can reconstruct along the tree. This creates three variables, 0, 1 and 2.

mydata <- factor(c(rep(0, 5), rep(1, 13), rep(2, 5)))
mydata
##  [1] 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
## Levels: 0 1 2

We can then use the ape function ace to reconstruct ancestral characters along the nodes of the tree. type = d means the character to be reconstructed is discrete.

ancestors <- ace(mydata, bird.orders, type = "d")

Now we can plot this on a phylogeny. First decide which colours we’d like. To look at a list of colours in R type in colors().

colours <- c("cornflowerblue", "cyan4", "goldenrod")

Now plot the tree and add labels to the tips and the nodes using the results in ancestors. We use label.offset = 1 to move the labels to the right a bit so the pie charts will fit…

plot(bird.orders, label.offset = 1)
tiplabels(pch = 21, bg = colours[as.numeric(mydata)], cex = 2, adj = 1)
nodelabels(pie = ancestors$lik.anc, piecol = colours)

pch = 21 sets the tip labels to be unfilled circles, bg defines the colours of the circles using the list of colours we provided, and ordering them based on what the species value was for mydata (i.e. 0, 1 or 2). cex = 2 doubles the point size, and adj = 1 moves the tip labels sideways so they don’t obscure the ends of the branches. pie makes pie charts coloured using the ancestral state reconstructions in ancestors, and piecol tells it to use the colours we have defined.

B. Ancestral state reconstructions on continuous data

We are going to use the Anolis data to create a phylogeny with different colours for different observed and reconstructed body sizes (snout-to-vent length, SVL). Remember we already loaded the phylogeny and data as follows:

data(anoletree)
anoledata <- read.csv("anole.data.csv", header = TRUE, row.names = 1) 

Note the names in anoledata are the species names without the Genus. In the phylogeny the species names are Anolis_species. So to get the two to match we need to add Anolis_ to each name.

rownames(anoledata) <- paste("Anolis", rownames(anoledata), sep = "_")

paste just sticks together Anolis with the names in anoles already with an underscore (_) separating them (sep = "_")

We then need to make sure the order of the species in the data matches that of the phylogeny.

anoledata <- anoledata[anoletree$tip.label, ]

Next we make a matrix containing only the SVL values for each Anolis species:

SVL <- as.matrix(anoledata)[,"AVG.SVL"]

This code selects only the variable AVG.SVL from anoledata (square brackets subset in R in the form [rows, columns]), and then uses as.matrix to make this into a matrix.

Take a look at SVL

head(SVL)
##        Anolis_ahli     Anolis_allogus Anolis_rubribarbus 
##           4.039125           4.040138           4.078469 
##       Anolis_imias      Anolis_sagrei     Anolis_bremeri 
##           4.099687           4.067162           4.113371

We then use the function contMap. contMap creates a tree with a mapped continuous character, i.e. where the value of the character is known at the tips, and estimated along the tree. The estimating of the character along the tree uses a Maximum Likelihood estimation procedure. Here we will tell contMap not to automatically plot the tree (using plot = FALSE) so we can make some modifications.

SVLplot <- contMap(anoletree, SVL, plot = FALSE)

For some reason this code breaks…

Finally let’s plot the tree as a fan (legend = 10 just spreads the legend out so it is readable).

plot(SVLplot, type = "fan", legend = 10)
## legend scale cannot be longer than total tree length; resetting

6. Phylomorphospace plots in R

We are going to use the Anolis data to create a phylomorphospace plot. These plots show where species fall within a morphospace (usually based on principal components analyses of morphological variables), and then reconstructs values for the nodes of the phylogeny so the whole phylogeny can be projected into the morphospace.

We already loaded the phylogeny and data as follows:

data(anoletree)
anoledata <- read.csv("anole.data.csv", header = TRUE, row.names = 1)

As we saw above, the names of anoles are the species names without the Genus. In the phylogeny the species names are Anolis_species. So to get the two to match we need to add Anolis_ to each name.

rownames(anoledata) <- paste("Anolis", rownames(anoledata), sep = "_")

paste just sticks together Anolis with the names in anoles already with an underscore (_) separating them (sep = "_")

Next we again need to make sure the order of the species in the data matches that of the phylogeny.

anoledata <- anoledata[anoletree$tip.label, ]

We can now run a phylogenetic principal components analysis on our morphological trait data and extract the PC scores for each PC axis using $S.

PC <- phyl.pca(anoletree, anoledata)$S

To plot the morphospace with colours, we need to define them. Here we choose six colours, and matched them with the ecomorph types listed here. These were built into anoletree.

colors <- setNames(c("cornflowerblue", "goldenrod", "chartreuse", "hotpink",
                     "orangered", "darkviolet"), 
                     sort(unique(getStates(anoletree, "tips"))))

Finally we can make the plot, and colour by ecomorph. label = "off" suppresses the printing of tip labels which keeps things a bit tidier.

phylomorphospace(anoletree, PC[,1:2], label = "off", node.by.map = TRUE, colors = colors)

7. Phylogenies and maps!

We can also plot phylogenies attached to maps showing where species come from. We will use the bird orders tree again as it’s small.

data(bird.orders)

We don’t know where the birds come from, so we will simulate some latitude and longitude data as follows. This uses a function that simulates traits along the phylogeny, so close relatives should end up with more similar latitude and longitude values. I’ve used high rates of evolution (sig2) to force the points to be spread out. Of course these values will be nonsensical for birds!

lat <- fastBM(bird.orders, sig2 = 100, bounds = c(-90, 90))
long <- fastBM(bird.orders, sig2 = 150, bounds = c(-180, 180))

Then we use the function phylo.to.map to match the locations with the world map. This produces some output in the form objective 98 etc. (I have suppressed it here).

phylomap <- phylo.to.map(bird.orders, cbind(lat,long), plot = FALSE)

We can either put the phylogeny next to the map and draw lines to the places they come from using type = "phylogram"

plot(phylomap,type = "phylogram", asp = 1.3, mar = c(0.1, 0.5, 3.1, 0.1))

…or plot the phylogeny directly onto the map using type = "direct"

plot(phylomap,type = "direct", asp = 1.3, mar = c(0.1, 0.5, 3.1, 0.1))

8. A new package for visualising trees: ggtree

ggtree is a newly released package that works with the popular ggplot framework in R. ggplot works by adding layers of different features together. A standard ggplot includes a layer that assigns the x and y variables then can have lots of other layers of points, lines, shapes and now with ggtree phylogenies.

To use ggtree you will have to install it, then load it using:

library(ggtree)

Let’s replicate some of the stuff we did above but using ggtree. As always choose whichever method you prefer when you’re doing this with your own data.

ggtree(fishtree) +
geom_text(aes(label=label), size = 1)
## Warning: Removed 61 rows containing missing values (geom_text).

Note that the layers here are the tree, and then the tip labels. Layers are separated by +

We can also zoom in on sections like with ape.

gzoom(fishtree, grep("Gymnothorax", fishtree$tip.label))

And we can have different shapes of phylogeny, different colours and even different line types.

ggtree(fishtree, color = "steelblue")

ggtree(fishtree, layout = "circular")

ggtree(fishtree, linetype = "dotted")

ggtree is really great for highlighting sections of a tree. First look at the nodes on your tree and pick an interesting one…

ggtree(fishtree) + 
  geom_text(aes(label = node))

Then highlight and/or annotate…

ggtree(fishtree) +
geom_hilight(node = 73, fill = "steelblue", alpha = 0.6) +
geom_hilight(node = 97, fill = "darkgreen", alpha = 0.6)

This package has a huge amount of promise, but still a few bugs and weird features (it’s designed by geneticists who are used to dealing with short species labels, hence the issues getting our species names to actually fit on the plot). The help files are also currently a bit sparse and some of the vignette examples don’t work. But if you’re likely to need to do any of this kind of stuff I recommend checking out the vignette and the paper:

Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 2016, doi:10.1111/2041-210X.12628.